Goto

Collaborating Authors

 Saint Joseph County


Does Local News Stay Local?: Online Content Shifts in Sinclair-Acquired Stations

Wanner, Miriam, Hager, Sophia, Field, Anjalie

arXiv.org Artificial Intelligence

Local news stations are often considered to be reliable sources of non-politicized information, particularly local concerns that residents care about. Because these stations are trusted news sources, viewers are particularly susceptible to the information they report. The Sinclair Broadcast group is a broadcasting company that has acquired many local news stations in the last decade. We investigate the effects of local news stations being acquired by Sinclair: how does coverage change? We use computational methods to investigate changes in internet content put out by local news stations before and after being acquired by Sinclair and in comparison to national news outlets. We find that there is clear evidence that local news stations report more frequently on national news at the expense of local topics, and that their coverage of polarizing national topics increases.


Exploring Conversational Design Choices in LLMs for Pedagogical Purposes: Socratic and Narrative Approaches for Improving Instructor's Teaching Practice

Chen, Si, Molnar, Isabel R., Li, Peiyu, Acunin, Adam, Hua, Ting, Ambrose, Alex, Chawla, Nitesh V., Metoyer, Ronald

arXiv.org Artificial Intelligence

Large language models (LLMs) typically generate direct answers, yet they are increasingly used as learning tools. Studying instructors' usage is critical, given their role in teaching and guiding AI adoption in education. We designed and evaluated TeaPT, an LLM for pedagogical purposes that supports instructors' professional development through two conversational approaches: a Socratic approach that uses guided questioning to foster reflection, and a Narrative approach that offers elaborated suggestions to extend externalized cognition. In a mixed-method study with 41 higher-education instructors, the Socratic version elicited greater engagement, while the Narrative version was preferred for actionable guidance. Subgroup analyses further revealed that less-experienced, AI-optimistic instructors favored the Socratic version, whereas more-experienced, AI-cautious instructors preferred the Narrative version. We contribute design implications for LLMs for pedagogical purposes, showing how adaptive conversational approaches can support instructors with varied profiles while highlighting how AI attitudes and experience shape interaction and learning.


A Systematic Survey of Model Extraction Attacks and Defenses: State-of-the-Art and Perspectives

Zhao, Kaixiang, Li, Lincan, Ding, Kaize, Gong, Neil Zhenqiang, Zhao, Yue, Dong, Yushun

arXiv.org Artificial Intelligence

Machine learning (ML) models have significantly grown in complexity and utility, driving advances across multiple domains. However, substantial computational resources and specialized expertise have historically restricted their wide adoption. Machine-Learning-as-a-Service (MLaaS) platforms have addressed these barriers by providing scalable, convenient, and affordable access to sophisticated ML models through user-friendly APIs. While this accessibility promotes widespread use of advanced ML capabilities, it also introduces vulnerabilities exploited through Model Extraction Attacks (MEAs). Recent studies have demonstrated that adversaries can systematically replicate a target model's functionality by interacting with publicly exposed interfaces, posing threats to intellectual property, privacy, and system security. In this paper, we offer a comprehensive survey of MEAs and corresponding defense strategies. We propose a novel taxonomy that classifies MEAs according to attack mechanisms, defense approaches, and computing environments. Our analysis covers various attack techniques, evaluates their effectiveness, and highlights challenges faced by existing defenses, particularly the critical trade-off between preserving model utility and ensuring security. We further assess MEAs within different computing paradigms and discuss their technical, ethical, legal, and societal implications, along with promising directions for future research. This systematic survey aims to serve as a valuable reference for researchers, practitioners, and policymakers engaged in AI security and privacy. Additionally, we maintain an online repository continuously updated with related literature at https://github.com/kzhao5/ModelExtractionPapers.


Generative Models for Synthetic Data: Transforming Data Mining in the GenAI Era

Li, Dawei, Huang, Yue, Li, Ming, Zhou, Tianyi, Zhang, Xiangliang, Liu, Huan

arXiv.org Artificial Intelligence

Generative models such as Large Language Models, Diffusion Models, and generative adversarial networks have recently revolutionized the creation of synthetic data, offering scalable solutions to data scarcity, privacy, and annotation challenges in data mining. This tutorial introduces the foundations and latest advances in synthetic data generation, covers key methodologies and practical frameworks, and discusses evaluation strategies and applications. Attendees will gain actionable insights into leveraging generative synthetic data to enhance data mining research and practice. More information can be found on our website: https://syndata4dm.github.io/.


Intellectual Property in Graph-Based Machine Learning as a Service: Attacks and Defenses

Li, Lincan, Shen, Bolin, Zhao, Chenxi, Sun, Yuxiang, Zhao, Kaixiang, Pan, Shirui, Dong, Yushun

arXiv.org Artificial Intelligence

Graph-structured data, which captures non-Euclidean relationships and interactions between entities, is growing in scale and complexity. As a result, training state-of-the-art graph machine learning (GML) models have become increasingly resource-intensive, turning these models and data into invaluable Intellectual Property (IP). To address the resource-intensive nature of model training, graph-based Machine-Learning-as-a-Service (GMLaaS) has emerged as an efficient solution by leveraging third-party cloud services for model development and management. However, deploying such models in GMLaaS also exposes them to potential threats from attackers. Specifically, while the APIs within a GMLaaS system provide interfaces for users to query the model and receive outputs, they also allow attackers to exploit and steal model functionalities or sensitive training data, posing severe threats to the safety of these GML models and the underlying graph data. To address these challenges, this survey systematically introduces the first taxonomy of threats and defenses at the level of both GML model and graph-structured data. Such a tailored taxonomy facilitates an in-depth understanding of GML IP protection. Furthermore, we present a systematic evaluation framework to assess the effectiveness of IP protection methods, introduce a curated set of benchmark datasets across various domains, and discuss their application scopes and future challenges. Finally, we establish an open-sourced versatile library named PyGIP, which evaluates various attack and defense techniques in GMLaaS scenarios and facilitates the implementation of existing benchmark methods. The library resource can be accessed at: https://labrai.github.io/PyGIP. We believe this survey will play a fundamental role in intellectual property protection for GML and provide practical recipes for the GML community.


Validating Terrain Models in Digital Twins for Trustworthy sUAS Operations

Bernal, Arturo Miguel Russell, Petterson, Maureen, Granadeno, Pedro Antonio Alarcon, Murphy, Michael, Mason, James, Cleland-Huang, Jane

arXiv.org Artificial Intelligence

--With the increasing deployment of small Unmanned Aircraft Systems (sUAS) in unfamiliar and complex environments, Environmental Digital Twins (EDT) that comprise weather, airspace, and terrain data are critical for safe flight planning and for maintaining appropriate altitudes during search and surveillance operations. With the expansion of sUAS capabilities through edge and cloud computing, accurate EDT are also vital for advanced sUAS capabilities, like geolocation. However, real-world sUAS deployment introduces significant sources of uncertainty, necessitating a robust validation process for EDT components. This paper focuses on the validation of terrain models, one of the key components of an EDT, for real-world sUAS tasks. These models are constructed by fusing U.S. Geological Survey (USGS) datasets and satellite imagery, incorporating high-resolution environmental data to support mission tasks. V alidating both the terrain models and their operational use by sUAS under real-world conditions presents significant challenges, including limited data granularity, terrain discontinuities, GPS and sensor inaccuracies, visual detection uncertainties, as well as onboard resources and timing constraints. We propose a 3-Dimensions validation process grounded in software engineering principles, following a workflow across granularity of tests, simulation to real world, and the analysis of simple to edge conditions. We demonstrate our approach using a multi-sUAS platform equipped with a T errain-A ware Digital Shadow. As swarms of small Unmanned Aircraft Systems (sUAS) are increasingly deployed in complex, unstructured environments such as disaster zones, wilderness areas, and wildfire regions, the need for accurate environmental models becomes critical. Effective sUAS mission planning requires awareness not only of dynamic airspace and weather conditions but also of the underlying terrain. In such settings, terrain is often the dominant factor influencing flight safety, sensor placement, line-of-sight communications, and search effectiveness. This paper focuses specifically on the role of terrain models that enable mission-level decision-making and flight planning for sUAS operations. However, terrain inaccuracies or blind spots, such as missing elevation data, undetected peaks, or mismatched georeferencing, can result in ineffective or even hazardous behavior by autonomous vehicles. To minimize these issues, we construct and maintain a terrain model by fusing multiple sources of environmental data, including public USGS datasets [1], [2], and satellite imagery [3].


PersonaEval: Are LLM Evaluators Human Enough to Judge Role-Play?

Zhou, Lingfeng, Zhang, Jialing, Gao, Jin, Jiang, Mohan, Wang, Dequan

arXiv.org Artificial Intelligence

Current role-play studies often rely on unvalidated LLM-as-a-judge paradigms, which may fail to reflect how humans perceive role fidelity. A key prerequisite for human-aligned evaluation is role identification, the ability to recognize who is speaking based on dialogue context. We argue that any meaningful judgment of role-playing quality (how well a character is played) fundamentally depends on first correctly attributing words and actions to the correct persona (who is speaking). We present PersonaEval, the first benchmark designed to test whether LLM evaluators can reliably identify human roles. PersonaEval uses human-authored dialogues from novels, scripts, and video transcripts, challenging models to determine the correct persona according to the conversation context. Our experiments, including a human study, show that even the best-performing LLMs reach only around 69% accuracy, well below the level needed for reliable evaluation. In contrast, human participants perform near ceiling with 90.8% accuracy, highlighting that current LLM evaluators are still not human enough to effectively judge role-play scenarios. To better understand this gap, we examine training-time adaptation and test-time compute, suggesting that reliable evaluation requires more than task-specific tuning, but depends on strong, human-like reasoning abilities in LLM evaluators. We release our benchmark at https://github.com/maple-zhou/PersonaEval.


A Survey on Model Extraction Attacks and Defenses for Large Language Models

Zhao, Kaixiang, Li, Lincan, Ding, Kaize, Gong, Neil Zhenqiang, Zhao, Yue, Dong, Yushun

arXiv.org Artificial Intelligence

Model extraction attacks pose significant security threats to deployed language models, potentially compromising intellectual property and user privacy. This survey provides a comprehensive taxonomy of LLM-specific extraction attacks and defenses, categorizing attacks into functionality extraction, training data extraction, and prompt-targeted attacks. We analyze various attack methodologies including API-based knowledge distillation, direct querying, parameter recovery, and prompt stealing techniques that exploit transformer architectures. We then examine defense mechanisms organized into model protection, data privacy protection, and prompt-targeted strategies, evaluating their effectiveness across different deployment scenarios. We propose specialized metrics for evaluating both attack effectiveness and defense performance, addressing the specific challenges of generative language models. Through our analysis, we identify critical limitations in current approaches and propose promising research directions, including integrated attack methodologies and adaptive defense mechanisms that balance security with model utility. This work serves NLP researchers, ML engineers, and security professionals seeking to protect language models in production environments.


Storm Surge in Color: RGB-Encoded Physics-Aware Deep Learning for Storm Surge Forecasting

Zhao, Jinpai, Cerrone, Albert, Valseth, Eirik, Westerink, Leendert, Dawson, Clint

arXiv.org Artificial Intelligence

Storm surge forecasting plays a crucial role in coastal disaster preparedness, yet existing machine learning approaches often suffer from limited spatial resolution, reliance on coastal station data, and poor generalization. Moreover, many prior models operate directly on unstructured spatial data, making them incompatible with modern deep learning architectures. In this work, we introduce a novel approach that projects unstructured water elevation fields onto structured Red Green Blue (RGB)-encoded image representations, enabling the application of Convolutional Long Short Term Memory (ConvLSTM) networks for end-to-end spatiotemporal surge forecasting. Our model further integrates ground-truth wind fields as dynamic conditioning signals and topo-bathymetry as a static input, capturing physically meaningful drivers of surge evolution. Evaluated on a large-scale dataset of synthetic storms in the Gulf of Mexico, our method demonstrates robust 48-hour forecasting performance across multiple regions along the Texas coast and exhibits strong spatial extensibility to other coastal areas. By combining structured representation, physically grounded forcings, and scalable deep learning, this study advances the frontier of storm surge forecasting in usability, adaptability, and interpretability.


MARK HALPERIN: Democrats try to construct a Frankenstein candidate while JD Vance gains momentum for 2028

FOX News

Democratic strategist James Carville said on Wednesday he doesn't buy it when wealthy Jewish donors tell him they're ditching the Democratic Party because of antisemitism among its members. He says they're doing it for a "f------ tax cut." There are two truths about presidential candidates. One: There is no such thing as a perfect candidate. Two: It is very difficult to convince party elites that there are no perfect candidates.